renv::restore()- The library is already synchronized with the lockfile.
Olink High-Throughput (HT) proteomic platform that combines the specificity of Proximity Extension Assay (PEA) with Next-Generation Sequencing (NGS) readout.
More generally, when the Plate Controls in a dataset differ from the reference Plate Control lot used by the analysis pipeline, an internal Plate Control Lot Factor can be applied to Plate Control extNPX values to align them to that reference.
\[ExtNPX_{i, PC} (\text{adjusted}) = ExtNPX_{i, PC} (\text{raw}) + \text{PC Lot Factor}_i\] Therefore, the actual PC normalization formula is: \[NPX_{i,j} = ExtNPX_{i,j} - \text{median}(ExtNPX_{i, \text{Plate Controls}}) - \text{PC Lot Factor}_i\]
The platform relies on a sophisticated hierarchy of controls to ensure data quality ExploreHT_QC.pdf.
There is a documented discrepancy between manufacturer-reported reliability and independent study results.
Official metrics report high precision ExploreHT_Validation.pdf: * IntraCV (Within-plate): Median ~11.2%. * InterCV (Between-plates): Median ~8.7%.
| Block | # of assays | Dilution factor | Intra-assay %CV mean | Inter-assay %CV mean |
|---|---|---|---|---|
| 1 | 742 | 1:1 | 23.3 | 20.7 |
| 2 | 1314 | 1:1 | 13.3 | 11.8 |
| 3 | 1204 | 1:1 | 9.8 | 7.1 |
| 4 | 1106 | 1:1 | 7.2 | 3.5 |
| 5 | 582 | 1:10 | 6.6 | 3.8 |
| 6 | 270 | 1:100 | 5.6 | 5.3 |
| 7 | 134 | 1:1000 | 11.0 | 6.2 |
| 8 | 68 | 1:100,000 | 8.6 | 12.4 |
A subset of 291 selected assays (~5%) that are used to assess CVs in Olink validation. This subset is based on proteins that are typically well-expressed in healthy plasma to enable the calculation of reliable CV values.
The LOD is the threshold where the protein signal is statistically distinguishable from the Negative Control background.
Reliability is strongly tied to the signal-to-noise ratio Rooney2025_ARIC.pdf: * Precision is inversely correlated with the percentage of samples above LOD (\(r = -0.77\)). * Assays where \(NPX < LOD\) are dominated by technical noise, leading to artificially inflated CVs.
| Column | Description | Type | Typical value |
|---|---|---|---|
| SampleID | The annotated sample ID | String | |
| Sample Type | Type of sample | String | PLATE_CONTROL, NEGATIVE_CONTROL, CONTROL, SAMPLE |
| WellID | Id for well | String | Capital letter A–H followed by number 1–12 |
| PlateID | Name of the plate the sample was run on | String | |
| DataAnalysisRefID | Reference ID for data analysis | String | |
| OlinkID | OlinkID for assay | String | |
| UniProt | UniProt ID for assay | String | |
| Assay | Gene name for assay | String | |
| AssayType | Type of assay | String | Amp_ctrl, inc_ctrl, ext_ctrl |
| Panel | Panel name | String | Explore_HT |
| Block | Name of the block the sample was run on | String | 1, 2, 3, 4, 5, 6, 7, or 8 |
| Count | The total number of counts | Integer | Greater than or equal to 1 |
| ExtNPX | Intermediate value between count and NPX: log2 of the ratio between data-point Count value and the count for the Extension Control assay for the same sample. | Double | -1.94701 |
| NPX | NPX value | Double | |
| Normalization | Type of normalization used in project | String | Plate control, Intensity or EXCLUDED |
| PCNormalizedNPX | NPX value displayed if plate control normalization has been chosen. | Double | 1.735509 |
| AssayQC | Overall QC status for an assay | String | NA, PASS, WARN |
| SampleQC | Overall QC status for a sample in a block | String | NA, PASS, WARN, FAIL |
| ExploreVersion | Software version of the module in NPX Explore HT & 3072 | String |
Olink provides normalized Parquet outputs upon request or based on experimental design (e.g., sample randomization). Intensity normalization is generally recommended as the primary method. This is because the standard NPX column will contain Intensity-normalized values, while PC-normalized values remain accessible via the dedicated PCNormalizedNPX column.
renv::restore()- The library is already synchronized with the lockfile.
library(tidyverse)
library(magrittr)
library(OlinkAnalyze)
library(knitr)
library(kableExtra)
source("R/olink_helpers.R")rp2_olink_fn <- "/Volumes/DCEG/CGF/Laboratory/Projects/MR-0084/RP0084-045/Data/NPXMap Exports/RP0084-045_Extended_NPX_2025-10-27.parquet"
d1_olink_fn <- "/Volumes/DCEG/CGF/Laboratory/Projects/DESL Aliquoting Projects/NAS_CS036024/Olink_DataDelivery/Q-13387_Hutchinson_Extended_NPX_2024-06-26.parquet"
rp2_npx <- read_NPX(rp2_olink_fn)ℹ This parquet file is for research use only:
"For Research Use Only. Not for use in diagnostic procedures."!
Multiple quantification columns detected (NPX, PCNormalizedNPX, Count). NPX will be used for downstream analysis.
ℹ Outdated Data Analysis Reference ID and Panel Archive Version combination detected.
→ Re-export data using Panel Archive Version 1.5.0+ anduse the newest version of the Fixed LOD file when calculating LOD (Version 6+).
! Failure to re-export may result in incorrect PC normalization across lots and Fixed LOD calculations.
dim(rp2_npx)[1] 783360 30
rp2_npx %>%
head() %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = FALSE) %>%
scroll_box(width = "100%")| SampleID | SampleType | WellID | PlateID | DataAnalysisRefID | OlinkID | UniProt | Assay | AssayType | Panel | Block | Count | ExtNPX | NPX | Normalization | PCNormalizedNPX | AssayQC | SampleQC | SoftwareVersion | SoftwareName | PanelDataArchiveVersion | PreProcessingVersion | PreProcessingSoftware | InstrumentType | IntraCV | InterCV | SampleBlockQCWarn | SampleBlockQCFail | BlockQCFail | AssayQCWarn |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IO_3862_1158_1 | SAMPLE | A7 | RP0084-045_A | D10007 | OID45511 | EXT1 | Extension control 1 | ext_ctrl | Explore_HT | 1 | 17461 | 0 | 0 | Intensity | 0 | NA | PASS | 1.3.0 | NPX Map | 1.3.0 | 5.1.0 | ngs2counts | Illumina NovaSeq X Plus | NaN | NaN | 1 | 1 | 1 | 0 |
| IR_8487_1158_1 | SAMPLE | A8 | RP0084-045_A | D10007 | OID45511 | EXT1 | Extension control 1 | ext_ctrl | Explore_HT | 1 | 16613 | 0 | 0 | Intensity | 0 | NA | PASS | 1.3.0 | NPX Map | 1.3.0 | 5.1.0 | ngs2counts | Illumina NovaSeq X Plus | NaN | NaN | 1 | 1 | 1 | 0 |
| KF_0956_1158_1 | SAMPLE | A9 | RP0084-045_A | D10007 | OID45511 | EXT1 | Extension control 1 | ext_ctrl | Explore_HT | 1 | 2220 | 0 | 0 | Intensity | 0 | NA | PASS | 1.3.0 | NPX Map | 1.3.0 | 5.1.0 | ngs2counts | Illumina NovaSeq X Plus | NaN | NaN | 1 | 1 | 1 | 0 |
| UM_0172_1035_1 | SAMPLE | A10 | RP0084-045_A | D10007 | OID45511 | EXT1 | Extension control 1 | ext_ctrl | Explore_HT | 1 | 14407 | 0 | 0 | Intensity | 0 | NA | PASS | 1.3.0 | NPX Map | 1.3.0 | 5.1.0 | ngs2counts | Illumina NovaSeq X Plus | NaN | NaN | 1 | 1 | 1 | 0 |
| IA_7060_1158_1 | SAMPLE | A11 | RP0084-045_A | D10007 | OID45511 | EXT1 | Extension control 1 | ext_ctrl | Explore_HT | 1 | 1231 | 0 | 0 | Intensity | 0 | NA | PASS | 1.3.0 | NPX Map | 1.3.0 | 5.1.0 | ngs2counts | Illumina NovaSeq X Plus | NaN | NaN | 1 | 1 | 1 | 0 |
| Sample_12 | SAMPLE_CONTROL | A12 | RP0084-045_A | D10007 | OID45511 | EXT1 | Extension control 1 | ext_ctrl | Explore_HT | 1 | 17385 | 0 | 0 | Intensity | 0 | NA | PASS | 1.3.0 | NPX Map | 1.3.0 | 5.1.0 | ngs2counts | Illumina NovaSeq X Plus | NaN | NaN | 0 | 1 | 1 | 0 |
# drive sample information from the parquet data frame
rp2_sam <- rp2_npx %>% select(1:4) %>%
unique %>%
rename(plate = PlateID, well = WellID) %>%
mutate(column = paste0("Column ", substr(well, 2, 3))) %>%
mutate(row = substr(well, 1, 1))
rp2_sam There are 144 samples in the project rp2 and 5440 assays for each sample.
5440*144 = 783360
OlinkAnalyze::olink_displayPlateLayout(data = rp2_sam, fill.color="SampleType")inspect_olink_qcThe OlinkAnalyze package contains an internal function, npxCheck(), which validates NPX data integrity and identifies cases where entire samples or assays consist of missing values (NA). However, it does not provide granular details regarding Sample-Block QC status, nor does it explicitly report on assays or samples flagged with a “WARN” status. To address this, we define the custom function inspect_olink_qc. This function extends standard validation by summarizing Olink QC flags across specific blocks, providing a clearer overview of data quality for downstream analysis.
inspect_olink_qc <- function(dat) {
# 1. Identify Assays that are not "PASS" (excluding controls)
# Filtering for AssayType=="assay" ensures we focus on actual targets
flagged_assays <- dat |>
filter(AssayType == "assay") |>
select(OlinkID, Assay, Block, AssayQC) |>
distinct() |>
filter(AssayQC != "PASS")
# 2. Identify unique SampleIDs that are not "PASS" or "NA"
# Based on Olink definitions, "NA" usually refers to excluded assays, not failed samples
flagged_sample_ids <- dat |>
filter(!SampleQC %in% c("NA", "PASS")) |>
pull(SampleID) |>
unique()
# 3. Create a block-wise matrix of SampleQC status for flagged samples
# This helps visualize if a sample failed across all blocks or just one
flagged_samples_matrix <- dat |>
filter(SampleID %in% flagged_sample_ids) |>
group_by(SampleID, Block, SampleQC) |>
summarise(N = n(), .groups = 'drop') |>
mutate(status_label = paste0(SampleQC, " (n=", N, ")")) |>
group_by(SampleID, Block) |>
summarise(QC = stringr::str_flatten(status_label, collapse = ", "), .groups = 'drop') |>
tidyr::pivot_wider(names_from = Block, values_from = QC)
# Return results as a named list
list(
flagged_assays = flagged_assays,
flagged_sample_ids = flagged_sample_ids,
flagged_samples_matrix = flagged_samples_matrix
)
}filter(AssayType == “assay”) is applied to exclude all internal controls.
qc_results <- inspect_olink_qc(rp2_npx)
qc_results$flagged_assays
qc_results$flagged_samples_matrix### fontsize_row does not work so I have to change fontsize
olink_heatmap_plot(rp2_npx, variable_row_list = 'SampleType', variable_col_list = "Block", show_colnames=F, cluster_cols=F, center_scale=T, fontsize=6, fontsize_row = 2)Warning: The assays OID40764, OID40857, OID42179, OID42282, OID42294, OID42327,
OID42394, OID42606, OID43241, OID43371, OID43738, OID43750, OID44351, OID44410,
OID44923 have NPX = NA for all samples. They will be excluded from the analysis
1 assay(s) exhibited assay QC warning. For more information see the AssayQC column.
AssayQC status of FAIL.SampleQC status of FAIL are set to NA. These are represented by black bars in the heatmap.olink_dist_plotBoxplots of NPX is another way to visualization the distribution of NPX values in each sample.
p <- rp2_npx |>
olink_dist_plot(color_g = "SampleQC") +
facet_grid(Block ~ PlateID, scales = "free_x", space = "free_x") +
theme(axis.text.x = element_text(angle = 90, size = 8, vjust = 0.5, hjust = 1))Warning: The assays OID40764, OID40857, OID42179, OID42282, OID42294, OID42327,
OID42394, OID42606, OID43241, OID43371, OID43738, OID43750, OID44351, OID44410,
OID44923 have NPX = NA for all samples. They will be excluded from the analysis
1 assay(s) exhibited assay QC warning. For more information see the AssayQC column.
p$layers[[1]]$geom_params$outlier_gp$alpha <- 0.5
p$layers[[1]]$geom_params$outlier_gp$size <- 0.5
p$layers[[1]]$geom_params$outlier_gp$shape <- 21
p$layers[[1]]$geom_params$outlier_gp$colour <- "transparent"
p$layers[[1]]$geom_params$outlier_gp$fill <- "black"
pGenerates a facet plot per Panel using ggplot2::ggplot and ggplot2::geom_point and stats::IQR plotting IQR vs. median for all samples. Horizontal dashed lines indicate +/-IQR_outlierDef standard deviations from the mean IQR (default 3). Vertical dashed lines indicate +/-median_outlierDef standard deviations from the mean sample median (default 3).
### Negative controls should be excluded
qc <- olink_qc_plot(rp2_npx |> filter(SampleType=="SAMPLE"), color_g = "PlateID", IQR_outlierDef = 3, median_outlierDef = 3)Warning: The assays OID40764, OID40857, OID42179, OID42282, OID42294, OID42327,
OID42394, OID42606, OID43241, OID43371, OID43738, OID43750, OID44351, OID44410,
OID44923 have NPX = NA for all samples. They will be excluded from the analysis
1 assay(s) exhibited assay QC warning. For more information see the AssayQC column.
qc
outliers <- qc$data %>% filter(Outlier == 1)
outliersolink_pca_plot(rp2_npx |> filter(SampleType=="SAMPLE"), color_g = "SampleQC")Warning: The assays OID40764, OID40857, OID42179, OID42282, OID42294, OID42327,
OID42394, OID42606, OID43241, OID43371, OID43738, OID43750, OID44351, OID44410,
OID44923 have NPX = NA for all samples. They will be excluded from the analysis
1 assay(s) exhibited assay QC warning. For more information see the AssayQC column.
Warning in npxProcessing_forDimRed(df = df, color_g = color_g, drop_assays =
drop_assays, : There are 4356 assay(s) that were imputed by their medians.
olink_pca_plot(rp2_npx |> filter(SampleType=="SAMPLE"), color_g = "PlateID")Warning: The assays OID40764, OID40857, OID42179, OID42282, OID42294, OID42327,
OID42394, OID42606, OID43241, OID43371, OID43738, OID43750, OID44351, OID44410,
OID44923 have NPX = NA for all samples. They will be excluded from the analysis
1 assay(s) exhibited assay QC warning. For more information see the AssayQC column.
Warning in npxProcessing_forDimRed(df = df, color_g = color_g, drop_assays =
drop_assays, : There are 4356 assay(s) that were imputed by their medians.
olink_umap_plot(rp2_npx |> filter(SampleType=="SAMPLE"), color_g = "SampleQC")Loading required namespace: umap
Warning: The assays OID40764, OID40857, OID42179, OID42282, OID42294, OID42327,
OID42394, OID42606, OID43241, OID43371, OID43738, OID43750, OID44351, OID44410,
OID44923 have NPX = NA for all samples. They will be excluded from the analysis
1 assay(s) exhibited assay QC warning. For more information see the AssayQC column.
Warning in npxProcessing_forDimRed(df = df, color_g = color_g, drop_assays =
drop_assays, : There are 4356 assay(s) that were imputed by their medians.
olink_umap_plot(rp2_npx |> filter(SampleType=="SAMPLE"), color_g = "PlateID")Warning: The assays OID40764, OID40857, OID42179, OID42282, OID42294, OID42327,
OID42394, OID42606, OID43241, OID43371, OID43738, OID43750, OID44351, OID44410,
OID44923 have NPX = NA for all samples. They will be excluded from the analysis
1 assay(s) exhibited assay QC warning. For more information see the AssayQC column.
Warning in npxProcessing_forDimRed(df = df, color_g = color_g, drop_assays =
drop_assays, : There are 4356 assay(s) that were imputed by their medians.
NPX values below Limit of Detection (LOD) are not reliable. In the OlinkAnalyze R package, the olink_lod() function provides two primary methods for integrating Limit of Detection (LOD) information into Olink Explore datasets:
Negative Control LOD (NCLOD)
This method calculates LOD values based specifically on the negative controls included within your own study’s dataset.
Fixed LOD (FixedLOD)
This method uses predetermined LOD values provided by Olink, rather than calculating them from your current data.
lod_file_path argument of the function.The latest fixed LOD file is available for download here (see Table 4 (a)). This file contains reference data for DataAnalysisRefID (DarID) DX00YY, where X represents blocks 1–8 and YY indicates the reference version (currently ranging from 01 to 14). In the rp2 study, the DataAnalysisRefID (or data_analysis_ref_id) is designated as D10007 through D80007.
The detailed use of olink_lod() is described here.
As there are no sufficient negative controls included in rp2, we can only apply fixed LOD here.
fixedLOD_fn <- "data/Explore HT_Fixed LOD.csv"
fixedlod.dat <- read.delim2(fixedLOD_fn, sep=';')
# Use kable to ensure Quarto treats these as separate citable tables
knitr::kable(head(fixedlod.dat))
rp2_lod <- rp2_npx |>
olink_lod(lod_file_path = fixedLOD_fn, lod_method = "FixedLOD")Warning: The assays OID40764, OID40857, OID42179, OID42282, OID42294, OID42327,
OID42394, OID42606, OID43241, OID43371, OID43738, OID43750, OID44351, OID44410,
OID44923 have NPX = NA for all samples. They will be excluded from the analysis
1 assay(s) exhibited assay QC warning. For more information see the AssayQC column.
knitr::kable(head(rp2_lod))| OlinkID | AssayType | UniProt | Assay | Panel | Block | DataAnalysisRefID | BimodalDistribution | LODNPX | LODCount | LODMethod | Version |
|---|---|---|---|---|---|---|---|---|---|---|---|
| OID40001 | assay | Q9NUQ8 | ABCF3 | Explore_HT | 1 | D10001 | FALSE | 0.976509184936807 | 2086 | lod_npx | 9.0.0 |
| OID40002 | assay | Q8WTS1 | ABHD5 | Explore_HT | 1 | D10001 | FALSE | 2.20770109787743 | 564 | lod_npx | 9.0.0 |
| OID40003 | assay | Q96I13 | ABHD8 | Explore_HT | 1 | D10001 | FALSE | 6.22270040477226 | 150 | lod_count | 9.0.0 |
| OID40004 | assay | Q9H845 | ACAD9 | Explore_HT | 1 | D10001 | FALSE | 7.2585583689779 | 178 | lod_count | 9.0.0 |
| OID40005 | assay | Q6NUN0 | ACSM5 | Explore_HT | 1 | D10001 | FALSE | 2.67065654184691 | 572 | lod_npx | 9.0.0 |
| OID40006 | assay | Q8IZF7 | ADGRF2 | Explore_HT | 1 | D10001 | FALSE | 4.91106891907256 | 386 | lod_npx | 9.0.0 |
| SampleID | SampleType | WellID | PlateID | DataAnalysisRefID | OlinkID | UniProt | Assay | AssayType | Panel | Block | Count | ExtNPX | NPX | Normalization | PCNormalizedNPX | AssayQC | SampleQC | SoftwareVersion | SoftwareName | PanelDataArchiveVersion | PreProcessingVersion | PreProcessingSoftware | InstrumentType | IntraCV | InterCV | SampleBlockQCWarn | SampleBlockQCFail | BlockQCFail | AssayQCWarn | LOD | PCNormalizedLOD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IO_3862_1158_1 | SAMPLE | A7 | RP0084-045_A | D10007 | OID45511 | EXT1 | Extension control 1 | ext_ctrl | Explore_HT | 1 | 17461 | 0 | 0 | Intensity | 0 | NA | PASS | 1.3.0 | NPX Map | 1.3.0 | 5.1.0 | ngs2counts | Illumina NovaSeq X Plus | NaN | NaN | 1 | 1 | 1 | 0 | NA | NA |
| IR_8487_1158_1 | SAMPLE | A8 | RP0084-045_A | D10007 | OID45511 | EXT1 | Extension control 1 | ext_ctrl | Explore_HT | 1 | 16613 | 0 | 0 | Intensity | 0 | NA | PASS | 1.3.0 | NPX Map | 1.3.0 | 5.1.0 | ngs2counts | Illumina NovaSeq X Plus | NaN | NaN | 1 | 1 | 1 | 0 | NA | NA |
| KF_0956_1158_1 | SAMPLE | A9 | RP0084-045_A | D10007 | OID45511 | EXT1 | Extension control 1 | ext_ctrl | Explore_HT | 1 | 2220 | 0 | 0 | Intensity | 0 | NA | PASS | 1.3.0 | NPX Map | 1.3.0 | 5.1.0 | ngs2counts | Illumina NovaSeq X Plus | NaN | NaN | 1 | 1 | 1 | 0 | NA | NA |
| UM_0172_1035_1 | SAMPLE | A10 | RP0084-045_A | D10007 | OID45511 | EXT1 | Extension control 1 | ext_ctrl | Explore_HT | 1 | 14407 | 0 | 0 | Intensity | 0 | NA | PASS | 1.3.0 | NPX Map | 1.3.0 | 5.1.0 | ngs2counts | Illumina NovaSeq X Plus | NaN | NaN | 1 | 1 | 1 | 0 | NA | NA |
| IA_7060_1158_1 | SAMPLE | A11 | RP0084-045_A | D10007 | OID45511 | EXT1 | Extension control 1 | ext_ctrl | Explore_HT | 1 | 1231 | 0 | 0 | Intensity | 0 | NA | PASS | 1.3.0 | NPX Map | 1.3.0 | 5.1.0 | ngs2counts | Illumina NovaSeq X Plus | NaN | NaN | 1 | 1 | 1 | 0 | NA | NA |
| Sample_12 | SAMPLE_CONTROL | A12 | RP0084-045_A | D10007 | OID45511 | EXT1 | Extension control 1 | ext_ctrl | Explore_HT | 1 | 17385 | 0 | 0 | Intensity | 0 | NA | PASS | 1.3.0 | NPX Map | 1.3.0 | 5.1.0 | ngs2counts | Illumina NovaSeq X Plus | NaN | NaN | 0 | 1 | 1 | 0 | NA | NA |
RP2 Dataset: Assays with NPX Values Below LODsample_na_cnts <- rp2_lod |>
filter(AssayType=="assay") |>
group_by(SampleID, SampleType, PlateID) |>
summarise(
N=sum(NPX < LOD, na.rm=T),
total_assays=n(),
non_nas = sum(!is.na(NPX)),
.groups="drop"
) |>
group_by(PlateID) |>
summarise(
Total=first(total_assays),
NonNA=first(non_nas),
Q1=quantile(N, 0.25, na.rm=T),
Median_Below_LOD=quantile(N, 0.50, na.rm=T),
Q3=quantile(N, 0.75, na.rm=T),
.groups="drop"
)
sample_na_cntslibrary(readxl)
.subj <- read_excel("data/Olink HCC Sample Details.xlsx", skip=1, sheet=1) |>
select(1:6) |>
setNames(c("subj_id", "aliquot_id", "subsample_id", "subj_type", "sex", "tissue_type"))New names:
• `Subject` -> `Subject...1`
• `` -> `...13`
• `Subject` -> `Subject...14`
.sam <- read_excel("data/Olink HCC Sample Details.xlsx", sheet=4) |>
setNames(c("project", "plate_id", "row", "col", "well", "subj_id", "aliquot_id", "subsample_id", "olink_sample_id")) |>
select(-row, -col, -well)
### We expect there are 48 samples in CS036024 (Q-13387-3) and 124 samples in RP0084-045
.sam %$% table(project)project
CS036024 (Q-13387-3) RP0084-045
48 124
### find out how many CGR samples matched in d1 and rp2
rp2_sam_full <- rp2_sam |>
inner_join(.sam, by=join_by(SampleID == olink_sample_id)) |>
inner_join(.subj |> select(subj_id, subj_type, sex, tissue_type))Joining with `by = join_by(subj_id)`
### 20 external contorl samples are not included. So there are only 124 rows
rp2_sam_fullmy_olink_pca(all_fix %>% filter(SampleID %in% cgr_all$SampleID), cgr_all, col_by=“subj_type”, panel_by=“plate”, ncol=1) ggsave(“graphs/Q7.my_pca.all_fix.type.pdf”, width=10, height=8)
run_olink_pca_workflow(rp2_npx %>% filter(SampleID %in% rp2_sam_full$SampleID), rp2_sam_full, color_by="plate", facet_by = NULL)
run_olink_pca_workflow(rp2_npx %>% filter(SampleID %in% rp2_sam_full$SampleID), rp2_sam_full, color_by="subj_type", facet_by="plate")
run_olink_pca_workflow(rp2_npx %>% filter(SampleID %in% rp2_sam_full$SampleID), rp2_sam_full, color_by="sex", facet_by="plate") calc_intraCV_nse to calculate IntraCV for any replicated samples including those Olink SAMPLE_CONTROL.
rp2rp2_npx |>
group_by(PlateID, SampleID) |>
summarise(
N_IntraCV=sum(!is.na(IntraCV)),
N_InterCV=sum(!is.na(InterCV))
)`summarise()` has grouped output by 'PlateID'. You can override using the
`.groups` argument.
### To better evaludate Olink HT performance, we managed to calculate intra- and inter-CV of all replicated samples, including Sample Controls
rp2_intraCV <- calc_intraCV_nse(rp2_lod, rp2_sam_full, rp2_sam)
### Our calculation should be matched with the Olink intraCV in those 291 assays
.dat <- rp2_intraCV |>
filter(PlateID=="PlateLayout_RP0084-045_B_ExploreHT (3)" & subsample_id == "SAMPLE_CONTROL") |>
select(OlinkID, newIntraCV=IntraCV) |>
inner_join(
rp2_npx |>
filter(PlateID=="PlateLayout_RP0084-045_B_ExploreHT (3)" & !is.na(IntraCV))|>
select(OlinkID, IntraCV) |>
unique() # all samples have the same intraCV in the plate
)Joining with `by = join_by(OlinkID)`
### all matched
.dat %$% table(newIntraCV - IntraCV < 1e-5, useNA="ifany")
TRUE
291
rp2_intraCV.median <- rp2_intraCV |>
mutate(
SampleType = ifelse(grepl("_CONTROL",subsample_id), subsample_id, "SAMPLE")
) |>
filter(AssayType == "assay" ) |>
group_by(OlinkID, PlateID, SampleType) |>
summarise(IntraCV_median = median(IntraCV, na.rm=T), .groups="drop")rp2_intraCV.sum_all <- rp2_intraCV.median |>
group_by(PlateID, SampleType) |>
summarise(
N=n(),
Q1=quantile(IntraCV_median, 0.25, na.rm=T),
Q2=quantile(IntraCV_median, 0.50, na.rm=T),
Q3=quantile(IntraCV_median, 0.75, na.rm=T),
.groups="drop"
)
# Use kable to render the data frame as a formal table
knitr::kable(rp2_intraCV.sum_all)| PlateID | SampleType | N | Q1 | Q2 | Q3 |
|---|---|---|---|---|---|
| PlateLayout_RP0084-045_B_ExploreHT (3) | NEGATIVE_CONTROL | 5401 | 9.691049 | 23.57762 | 52.62142 |
| PlateLayout_RP0084-045_B_ExploreHT (3) | PLATE_CONTROL | 5401 | 8.473897 | 16.10635 | 31.50751 |
| PlateLayout_RP0084-045_B_ExploreHT (3) | SAMPLE | 5401 | 9.623662 | 17.26379 | 28.80782 |
| PlateLayout_RP0084-045_B_ExploreHT (3) | SAMPLE_CONTROL | 5401 | 7.073281 | 13.69899 | 26.84559 |
| RP0084-045_A | NEGATIVE_CONTROL | 5401 | 10.222677 | 23.51408 | 51.76379 |
| RP0084-045_A | PLATE_CONTROL | 5401 | 7.913498 | 15.44079 | 29.94569 |
| RP0084-045_A | SAMPLE | 5401 | 8.084016 | 14.22349 | 25.89999 |
| RP0084-045_A | SAMPLE_CONTROL | 5401 | 7.609023 | 14.69867 | 29.18682 |
### Fig1
rp2_intraCV.median |>
mutate(IntraCV_median=ifelse(IntraCV_median> 50, 50, IntraCV_median)) |>
ggplot(aes(x=IntraCV_median, fill=PlateID)) +
geom_density( alpha=0.2) +
facet_grid(. ~ SampleType) +
theme(legend.position = "bottom")
### Fig 2
.dat <- rp2_lod |>
filter(SampleType=="SAMPLE") |>
group_by(OlinkID, PlateID) |>
summarise(
Perc_gtLOD = mean(NPX>LOD, na.rm=T),
.groups='drop'
) |>
inner_join(
rp2_intraCV.median |> filter(SampleType == "SAMPLE")
)Joining with `by = join_by(OlinkID, PlateID)`
# Assuming your data is in a data frame named 'df'
ggplot(data = .dat, aes(x = Perc_gtLOD, y = IntraCV_median, col=PlateID)) +
geom_point(size = 1, alpha = 0.8) +
geom_smooth(method = "lm",
formula = y ~ x,
color = "black",
linetype = "dashed",
linewidth = 0.8,
# This line is crucial: the plot description mentions the CV
# was log2 transformed for the regression.
# However, since we are showing the line on a log10 axis,
# it is best to let ggplot handle the scale transformation
# or ensure the CV data is already log2 transformed
# for the fit, which is complex.
# A simpler interpretation is to fit the line to the data shown on the plot:
se = FALSE) +
scale_y_log10() + facet_wrap(~PlateID) + theme(legend.position = "bottom")As demonstrated in Figure 12 (b), the precision (CV) of Olink Explore HT assays is strongly inversely correlated with protein detectability (the percentage of samples above the validation LOD). To improve data quality, we recoded NPX values below the LOD as NA. We then generated a summary table of the intra-CV, mirroring the previous analysis where all NPX values were included, to evaluate the resulting improvement in precision.
rp2_intraCV_lod <- calc_intraCV_nse(rp2_lod, rp2_sam_full, rp2_sam, treat_LOD_as_NA = T)
rp2_intraCV.median_lod <- rp2_intraCV_lod |>
mutate(
SampleType = ifelse(grepl("_CONTROL",subsample_id), subsample_id, "SAMPLE")
) |>
filter(AssayType == "assay" ) |>
group_by(OlinkID, PlateID, SampleType) |>
summarise(IntraCV_median = median(IntraCV, na.rm=T), .groups="drop")
rp2_intraCV.sum_lod <- rp2_intraCV.median_lod |>
group_by(PlateID, SampleType) |>
summarise(
N=n(),
Q1=quantile(IntraCV_median, 0.25, na.rm=T),
Q2=quantile(IntraCV_median, 0.50, na.rm=T),
Q3=quantile(IntraCV_median, 0.75, na.rm=T),
.groups="drop"
)
# Use kable to render the data frame as a formal table
knitr::kable(rp2_intraCV.sum_lod)| PlateID | SampleType | N | Q1 | Q2 | Q3 |
|---|---|---|---|---|---|
| PlateLayout_RP0084-045_B_ExploreHT (3) | NEGATIVE_CONTROL | 750 | 6.580434 | 14.896913 | 30.78217 |
| PlateLayout_RP0084-045_B_ExploreHT (3) | PLATE_CONTROL | 2672 | 5.659866 | 8.527416 | 13.27701 |
| PlateLayout_RP0084-045_B_ExploreHT (3) | SAMPLE | 4017 | 7.297420 | 9.780197 | 14.18543 |
| PlateLayout_RP0084-045_B_ExploreHT (3) | SAMPLE_CONTROL | 2815 | 5.034002 | 8.330916 | 14.09463 |
| RP0084-045_A | NEGATIVE_CONTROL | 670 | 6.816213 | 14.583087 | 29.06493 |
| RP0084-045_A | PLATE_CONTROL | 2657 | 5.143483 | 8.083143 | 13.17757 |
| RP0084-045_A | SAMPLE | 3482 | 5.731077 | 8.538082 | 13.35158 |
| RP0084-045_A | SAMPLE_CONTROL | 2799 | 5.348042 | 8.918910 | 15.37332 |